We can be happy and joyful, married or not!🎎👩¶

(An effective guidebook for handling the marriage pressure from family!!!)¶

In this project, I'll first present the questions and results from my data exploration. The detailed data processing steps will be shown below my conclusion¶

Growing up, my parents always emphasized that a complete and happy life meant getting married. So, in this project, I want to use data to explore the sources of happiness for people in different marital statuses. I'm particularly interested in understanding if not getting married, and being an independent individual, means one can't be happy

The whole dataset: HappyDB is a corpus of 100,000 crowd-sourced happy moments. You can read more about it on https://megagon.ai/happydb-a-happiness-database-of-100000-happy-moments/. But I only use a subset of this dataset to conduct data analysis in this report.

Firstly I join ‘cleaned_hm.csv’ and 'demographic.csv' together called 'join_df1'. To make data handling easier down the line, I've converted 'marital status', 'country', and 'predicted category' into numerical codes for basic data exploration. Also, I've selected 'marital status', 'cleaned_hm' which includes specific phrases, and 'predicted category' to create a new dataframe called 'join_df2'

Question1: Are only married people happy?¶

image-5.png

In this collection of 100,000 crowd-sourced happy moments, it turns out that single folks contributed the most to these joyful instances. Married people came in second, followed by those who are divorced.

Looking solely from the perspective of analyzing this corpus, it seems that single people tend to be happier, and those who are married also share a good amount of happiness. However, after experiencing less favorable outcomes in marriage, such as divorce or widowhood, those who are separated contribute fewer happy moments to the corpus

Question2: What are the primary types of happy moments experienced by people with different marital statuses?¶

image.png

For single and widowed individuals, their happiness is often foucs on their achievements. On the other hand, for those who are married divorced orseparated, their happiness tends to focus on affection

Question3: What are the main sources of happiness for people in different marital statuses?¶

image-4.png

For married individuals, their happiness largely stems from their children, their partner, and their work.

When married people feeling happy, what they say the most goes something like this:

  • My husband brought toys for my son its fabulous its too good my son so happy am too
  • My husband made me my favorite food for dinner.
  • My wife and I took our older daughters to see "Hamilton" in New York this past weekend; we all loved it.
  • My close friend Subha called me yesterday evening and shared her happiness with me that her husband had got a salary hike along with promotion.

image-2.png

For single individuals, their happiness largely stems from their fiends and their work.

When single people feeling happy, what they say the most goes something like this:

  • I got promoted at work and my salary was increased.
  • I had a good workout.
  • I had lunch with friends.
  • I ran into an old friend who I haven't seen in a long time, and we had coffee and good conversation.
  • A friend of mine got hired for a new job that she really wanted.

image-2.png

For divorced individuals, their happiness largely stems from their work and children.

When divorced people feeling happy, what they say the most goes something like this:

  • I made vacation plans with my daughter today for Florida in July.
  • I picked my daughter up from the airport and we have a fun and good conversation on the way home.
  • My teenage daughter finished her hardest exams to complete her junior year of high school and she was very talkative to me about it.
  • On Monday night I attended an awards ceremony at my teenage daughter's high school and she won numerous top academic awards.
  • I worked on a project that will benefit people with disabilities.
  • I had a good workout.
  • I managed to squeeze in a good workout between work tasks.

image-2.png

For those who are separated, their moments of joy tend to center more around the company of friends and the value they place on their personal time.

When seperated people feeling happy, what they say the most goes something like this:

  • My friend and I took a long walk.
  • I invited my friends over.
  • I was happy last week to go to Las Vegas and spend some kid free time with my best friend and her aunts.
  • I cooked last week. I roasted a duck. It is the first time I was able to do it perfectly. I feel good and happy. My family enjoyed it.
  • Meeting my best friend who is so much like me, and getting to hang out all the time with her.
  • I got to see my best friend for the first time in over a month. I've really missed her - like missing a part of myself (my better half practically.)

image.png

For those who are widowed, their moments of happiness often stem from their house and children

When widowed people feeling happy, what they say the most goes something like this:

  • The day I cleaned the entire house and cooked a great meal for my family.
  • Having lost my deaf dog somewhere in the house or garden and searching for her for 15 min utes. I was happy to find her in the bathroom happily cleaning the toilet seat, she was very pleased with her self. The look on her face still makes me smile
  • My son surprised me this morning by waking me up to his home made pancakes breakfast.
  • My son bought an english bulldog puppy.

Conclusion¶

Whether we are in a marital relationship or not, we can all find happiness; it's just that the sources of joy differ. However, since single people contribute the most to the corpus of happy moments in this database, I can reasonably infer that one can be very happy without ever marrying! At the same time, given that widowed, separated, and divorced individuals contribute fewer happy moments, it's a reasonable assumption that an unhappy marriage might reduce the frequency of joyful moments.

And I wanna say, mom and dad, I can be very happy on my own, too. I find a lot of joy in my friends and my work, and I lead a very fulfilling life.

Detailed Data Processing¶

Importing Data¶

In [1]:
import pandas as pd
In [2]:
CDF = pd.read_csv('/Users/janicemeng/Desktop/Project1/data/data/cleaned_hm.csv')
CDF.head()
Out[2]:
hmid wid reflection_period original_hm cleaned_hm modified num_sentence ground_truth_category predicted_category
0 27673 2053 24h I went on a successful date with someone I fel... I went on a successful date with someone I fel... True 1 NaN affection
1 27674 2 24h I was happy when my son got 90% marks in his e... I was happy when my son got 90% marks in his e... True 1 NaN affection
2 27675 1936 24h I went to the gym this morning and did yoga. I went to the gym this morning and did yoga. True 1 NaN exercise
3 27676 206 24h We had a serious talk with some friends of our... We had a serious talk with some friends of our... True 2 bonding bonding
4 27677 6227 24h I went with grandchildren to butterfly display... I went with grandchildren to butterfly display... True 1 NaN affection
In [3]:
Demographic = pd.read_csv('/Users/janicemeng/Desktop/Project1/data/data/demographic.csv')
Demographic.head()
Out[3]:
wid age country gender marital parenthood
0 1 37.0 USA m married y
1 2 29.0 IND m married y
2 3 25 IND m single n
3 4 32 USA m married y
4 5 29 USA m married y
In [4]:
Senselabel =pd.read_csv('/Users/janicemeng/Desktop/Project1/data/data/senselabel.csv')
Senselabel.head()
Out[4]:
hmid tokenOffset word lowercaseLemma POS MWE offsetParent supersenseLabel
0 31526 1 I i PRON O 0 NaN
1 31526 2 found find VERB O 0 v.cognition
2 31526 3 a a DET O 0 NaN
3 31526 4 silver silver ADJ O 0 NaN
4 31526 5 coin coin NOUN O 0 n.artifact

Data Cleaning¶

In [5]:
join_df1 = pd.merge(CDF, Demographic, on='wid', how='inner')
join_df1.head()
Out[5]:
hmid wid reflection_period original_hm cleaned_hm modified num_sentence ground_truth_category predicted_category age country gender marital parenthood
0 27673 2053 24h I went on a successful date with someone I fel... I went on a successful date with someone I fel... True 1 NaN affection 35 USA m single n
1 27873 2053 24h I played a new game that was fun and got to en... I played a new game that was fun and got to en... True 1 NaN leisure 35 USA m single n
2 28073 2053 24h I listened to some music and heard an entire a... I listened to some music and heard an entire a... True 1 NaN leisure 35 USA m single n
3 33522 2053 24h Went to see a movie with my friend Went to see a movie with my friend True 1 NaN bonding 35 USA m single n
4 34522 2053 24h Played guitar, learning a song on it Played guitar, learning a song on it True 1 NaN leisure 35 USA m single n
In [6]:
join_df1.to_csv('/Users/janicemeng/Desktop/Project1/output_to_joined_csv.csv', index = False)
In [7]:
missing_values = join_df1.isnull().sum()
print(missing_values)
hmid                         0
wid                          0
reflection_period            0
original_hm                  0
cleaned_hm                   0
modified                     0
num_sentence                 0
ground_truth_category    86410
predicted_category           0
age                         93
country                    203
gender                      79
marital                    157
parenthood                  78
dtype: int64
In [8]:
join_df1 = join_df1.dropna()
In [9]:
missing_values = join_df1.isnull().sum()
print(missing_values)
hmid                     0
wid                      0
reflection_period        0
original_hm              0
cleaned_hm               0
modified                 0
num_sentence             0
ground_truth_category    0
predicted_category       0
age                      0
country                  0
gender                   0
marital                  0
parenthood               0
dtype: int64
In [10]:
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
In [11]:
import io
import re
import string
import tqdm
import numpy as np
import multiprocessing
from gensim.models import Word2Vec
In [12]:
print(join_df1.columns)
Index(['hmid', 'wid', 'reflection_period', 'original_hm', 'cleaned_hm',
       'modified', 'num_sentence', 'ground_truth_category',
       'predicted_category', 'age', 'country', 'gender', 'marital',
       'parenthood'],
      dtype='object')
In [13]:
#Converts a column 'predicted_category' in a DataFrame into a categorical data type,creates a dictionary mapping category labels to numerical codes, and then adds a new column 'predicted_category_code' with these numerical codes assigned to each category.
In [14]:
join_df1['predicted_category'] = join_df1['predicted_category'].astype('category')
category_to_code = dict(enumerate(join_df1['predicted_category'].cat.categories))
print("Category to Code:", category_to_code)
join_df1['predicted_category_code'] = join_df1['predicted_category'].cat.codes
Category to Code: {0: 'achievement', 1: 'affection', 2: 'bonding', 3: 'enjoy_the_moment', 4: 'exercise', 5: 'leisure', 6: 'nature'}
In [15]:
sns.countplot(x='predicted_category_code',data=join_df1, palette='pastel')
Out[15]:
<Axes: xlabel='predicted_category_code', ylabel='count'>
In [16]:
#In the HappyDB corpus, 'affection' is the most frequently occurring term, followed by 'achievement'. The third most common term is 'enjoy the moment'
In [17]:
join_df1['marital'] = join_df1['marital'].astype('category')
marital_to_code = dict(enumerate(join_df1['marital'].cat.categories))
print("marital to Code:", marital_to_code)
join_df1['marital_code'] = join_df1['marital'].cat.codes
marital to Code: {0: 'divorced', 1: 'married', 2: 'separated', 3: 'single', 4: 'widowed'}
In [18]:
join_df1['country'] = join_df1['country'].astype('category')
country_to_code = dict(enumerate(join_df1['country'].cat.categories))
print("country to Code:", country_to_code)
join_df1['country_code'] = join_df1['country'].cat.codes
country to Code: {0: 'AFG', 1: 'ALB', 2: 'ARE', 3: 'ARG', 4: 'ARM', 5: 'ASM', 6: 'AUS', 7: 'AUT', 8: 'BEL', 9: 'BGD', 10: 'BGR', 11: 'BHS', 12: 'BRA', 13: 'CAN', 14: 'CHL', 15: 'COL', 16: 'CRI', 17: 'CYP', 18: 'CZE', 19: 'DEU', 20: 'DNK', 21: 'DOM', 22: 'DZA', 23: 'ECU', 24: 'EGY', 25: 'ESP', 26: 'EST', 27: 'ETH', 28: 'FIN', 29: 'FRA', 30: 'GBR', 31: 'GHA', 32: 'GMB', 33: 'GRC', 34: 'GTM', 35: 'HRV', 36: 'IDN', 37: 'IND', 38: 'IRL', 39: 'ISL', 40: 'ITA', 41: 'JAM', 42: 'JPN', 43: 'KEN', 44: 'KNA', 45: 'KWT', 46: 'LKA', 47: 'LTU', 48: 'LVA', 49: 'MAC', 50: 'MDA', 51: 'MEX', 52: 'MKD', 53: 'MLT', 54: 'MYS', 55: 'NGA', 56: 'NIC', 57: 'NLD', 58: 'NOR', 59: 'NZL', 60: 'PAK', 61: 'PER', 62: 'PHL', 63: 'POL', 64: 'PRI', 65: 'PRT', 66: 'ROU', 67: 'RUS', 68: 'SGP', 69: 'SRB', 70: 'SVN', 71: 'SWE', 72: 'TCA', 73: 'THA', 74: 'TTO', 75: 'TUR', 76: 'TWN', 77: 'UGA', 78: 'UMI', 79: 'URY', 80: 'USA', 81: 'VEN', 82: 'VNM', 83: 'ZAF'}
In [19]:
join_df1['parenthood'] = join_df1['parenthood'].astype('category')
parenthood_to_code = dict(enumerate(join_df1['parenthood'].cat.categories))
print("parenthood to Code:", parenthood_to_code)
join_df1['parenthood_code'] = join_df1['parenthood'].cat.codes
parenthood to Code: {0: 'n', 1: 'y'}
In [20]:
join_df1['gender'] = join_df1['gender'].astype('category')
gender_to_code = dict(enumerate(join_df1['gender'].cat.categories))
print("gender to Code:", gender_to_code)
join_df1['gender_code'] = join_df1['gender'].cat.codes
gender to Code: {0: 'f', 1: 'm', 2: 'o'}
In [21]:
join_df1 = join_df1.drop(['reflection_period', 'original_hm', 'modified', 'reflection_period', 'num_sentence', 'ground_truth_category'], axis=1)
In [22]:
number_of_rows = join_df1.shape[0]
number_of_rows
Out[22]:
14055
In [23]:
join_df2 = join_df1[['marital','marital_code','cleaned_hm','predicted_category','predicted_category_code','gender']]
join_df2
Out[23]:
marital marital_code cleaned_hm predicted_category predicted_category_code gender
6 single 3 I played a game for about half an hour. leisure 5 m
15 married 1 When my family plan a abroad tour with me affection 1 m
19 married 1 When my house ready to live with my family affection 1 m
23 married 1 When my friend meet me today with expensive gi... bonding 2 m
25 married 1 I was very happy when my son playing with whol... affection 1 m
... ... ... ... ... ... ...
100494 married 1 My tooth stopped aching after my dentist visit. achievement 0 f
100496 married 1 I took a bath with my husband. affection 1 f
100526 married 1 I got on the scales in the morning and I was 5... achievement 0 f
100529 married 1 Quite dinner with my wife. affection 1 m
100533 married 1 Yesterday evening I received a call from unkno... bonding 2 m

14055 rows × 6 columns

Data Exploring¶

In [24]:
sns.countplot(x='marital',data=join_df2, palette='pastel')
Out[24]:
<Axes: xlabel='marital', ylabel='count'>
In [25]:
marital_counts = join_df2['marital'].value_counts()
print(marital_counts)
single       7409
married      6040
divorced      465
separated      84
widowed        57
Name: marital, dtype: int64
In [26]:
ax = sns.countplot(data=join_df2, x='predicted_category', hue='marital')
plt.title('Predicted Category Distribution by Marital Status')
plt.xlabel('Predicted Category')
plt.ylabel('Count')
plt.legend(title='Marital Status')
ax.set_xticklabels(ax.get_xticklabels(), rotation=45, ha='right') 
Out[26]:
[Text(0, 0, 'achievement'),
 Text(1, 0, 'affection'),
 Text(2, 0, 'bonding'),
 Text(3, 0, 'enjoy_the_moment'),
 Text(4, 0, 'exercise'),
 Text(5, 0, 'leisure'),
 Text(6, 0, 'nature')]
In [27]:
#For single and widowed individuals, their happiness is often derived more from their achievements. On the other hand, for those who are married divorced orseparated, their happiness tends to come more from affection
In [28]:
join_df3 = join_df1[['gender','cleaned_hm','predicted_category']]
join_df3.head()
Out[28]:
gender cleaned_hm predicted_category
6 m I played a game for about half an hour. leisure
15 m When my family plan a abroad tour with me affection
19 m When my house ready to live with my family affection
23 m When my friend meet me today with expensive gi... bonding
25 m I was very happy when my son playing with whol... affection
In [29]:
sns.countplot(x='gender',data=join_df3, palette='pastel')
Out[29]:
<Axes: xlabel='gender', ylabel='count'>
In [30]:
join_df4 = join_df1[['country','cleaned_hm']]
Top_Five_Countires = join_df4['country'].value_counts().head(5)  
Top_Five_Countires
Out[30]:
USA    10477
IND     2954
VEN       71
CAN       70
GBR       54
Name: country, dtype: int64

Natural Language Processing¶

In [31]:
!pip install wordcloud
Requirement already satisfied: wordcloud in /Users/janicemeng/anaconda3/lib/python3.11/site-packages (1.9.3)
Requirement already satisfied: numpy>=1.6.1 in /Users/janicemeng/anaconda3/lib/python3.11/site-packages (from wordcloud) (1.24.3)
Requirement already satisfied: pillow in /Users/janicemeng/anaconda3/lib/python3.11/site-packages (from wordcloud) (9.4.0)
Requirement already satisfied: matplotlib in /Users/janicemeng/anaconda3/lib/python3.11/site-packages (from wordcloud) (3.7.1)
Requirement already satisfied: contourpy>=1.0.1 in /Users/janicemeng/anaconda3/lib/python3.11/site-packages (from matplotlib->wordcloud) (1.0.5)
Requirement already satisfied: cycler>=0.10 in /Users/janicemeng/anaconda3/lib/python3.11/site-packages (from matplotlib->wordcloud) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in /Users/janicemeng/anaconda3/lib/python3.11/site-packages (from matplotlib->wordcloud) (4.25.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /Users/janicemeng/anaconda3/lib/python3.11/site-packages (from matplotlib->wordcloud) (1.4.4)
Requirement already satisfied: packaging>=20.0 in /Users/janicemeng/anaconda3/lib/python3.11/site-packages (from matplotlib->wordcloud) (23.0)
Requirement already satisfied: pyparsing>=2.3.1 in /Users/janicemeng/anaconda3/lib/python3.11/site-packages (from matplotlib->wordcloud) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7 in /Users/janicemeng/anaconda3/lib/python3.11/site-packages (from matplotlib->wordcloud) (2.8.2)
Requirement already satisfied: six>=1.5 in /Users/janicemeng/anaconda3/lib/python3.11/site-packages (from python-dateutil>=2.7->matplotlib->wordcloud) (1.16.0)
In [32]:
import plotly.express as px
from urllib import request
from wordcloud import WordCloud, STOPWORDS 
import csv
import bs4
from tqdm.notebook import trange, tqdm
In [33]:
#Splitting each word, removing stopwords and punctuation
In [34]:
join_df2.loc[:, 'words'] = join_df2['cleaned_hm'].str.split()
/var/folders/vp/pthgq5853bq0jz4mbv9ykfcm0000gn/T/ipykernel_4305/2220580772.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  join_df2.loc[:, 'words'] = join_df2['cleaned_hm'].str.split()
In [35]:
join_df2['words']
Out[35]:
6         [I, played, a, game, for, about, half, an, hour.]
15        [When, my, family, plan, a, abroad, tour, with...
19        [When, my, house, ready, to, live, with, my, f...
23        [When, my, friend, meet, me, today, with, expe...
25        [I, was, very, happy, when, my, son, playing, ...
                                ...                        
100494    [My, tooth, stopped, aching, after, my, dentis...
100496               [I, took, a, bath, with, my, husband.]
100526    [I, got, on, the, scales, in, the, morning, an...
100529                     [Quite, dinner, with, my, wife.]
100533    [Yesterday, evening, I, received, a, call, fro...
Name: words, Length: 14055, dtype: object
In [36]:
import nltk
from nltk.corpus import stopwords
In [37]:
nltk.download('stopwords')
stop_words = set(stopwords.words('english'))
[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/janicemeng/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
In [38]:
custom_stop_words = ['got', 'day', 'went', 'one', 'today', 'happy','made','able','found','month','new']
stop_words.update(custom_stop_words)
In [39]:
#stop_words
In [40]:
def remove_stopwords(word_list):
    return [word for word in word_list if word.lower() not in stop_words]
In [41]:
join_df2.loc[:, 'words_joined'] = join_df2['words'].apply(remove_stopwords)
/var/folders/vp/pthgq5853bq0jz4mbv9ykfcm0000gn/T/ipykernel_4305/2137858264.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  join_df2.loc[:, 'words_joined'] = join_df2['words'].apply(remove_stopwords)
In [42]:
join_df2.head()
Out[42]:
marital marital_code cleaned_hm predicted_category predicted_category_code gender words words_joined
6 single 3 I played a game for about half an hour. leisure 5 m [I, played, a, game, for, about, half, an, hour.] [played, game, half, hour.]
15 married 1 When my family plan a abroad tour with me affection 1 m [When, my, family, plan, a, abroad, tour, with... [family, plan, abroad, tour]
19 married 1 When my house ready to live with my family affection 1 m [When, my, house, ready, to, live, with, my, f... [house, ready, live, family]
23 married 1 When my friend meet me today with expensive gi... bonding 2 m [When, my, friend, meet, me, today, with, expe... [friend, meet, expensive, gift]
25 married 1 I was very happy when my son playing with whol... affection 1 m [I, was, very, happy, when, my, son, playing, ... [son, playing, whole]
In [43]:
join_df2.loc[:, 'words_joined'] = join_df2['words_joined'].apply(' '.join)
/var/folders/vp/pthgq5853bq0jz4mbv9ykfcm0000gn/T/ipykernel_4305/2789987770.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  join_df2.loc[:, 'words_joined'] = join_df2['words_joined'].apply(' '.join)
In [44]:
join_df2.head()
Out[44]:
marital marital_code cleaned_hm predicted_category predicted_category_code gender words words_joined
6 single 3 I played a game for about half an hour. leisure 5 m [I, played, a, game, for, about, half, an, hour.] played game half hour.
15 married 1 When my family plan a abroad tour with me affection 1 m [When, my, family, plan, a, abroad, tour, with... family plan abroad tour
19 married 1 When my house ready to live with my family affection 1 m [When, my, house, ready, to, live, with, my, f... house ready live family
23 married 1 When my friend meet me today with expensive gi... bonding 2 m [When, my, friend, meet, me, today, with, expe... friend meet expensive gift
25 married 1 I was very happy when my son playing with whol... affection 1 m [I, was, very, happy, when, my, son, playing, ... son playing whole
In [45]:
def preprocessing_text(join_df2):
    df.words_joined = df.words_joined.str.replace(r'[^\w\s]', '-')
    df.words_joined= df.words_joined.str.replace(' ', '_')
    df.words_joined =  df.words_joined.str.replace('-_-', '')
    df.words_joined =  df.words_joined.str.replace('-', ' ')
    return join_df2 
In [46]:
text = ' '.join([' '.join(words) for words in join_df2['words_joined']])
In [47]:
married_df = join_df2[join_df2['marital'] == 'married']
single_df = join_df2[join_df2['marital'] == 'single']
divorced_df = join_df2[join_df2['marital'] == 'divorced']
seperated_df = join_df2[join_df2['marital'] == 'seperated']
widowed_df = join_df2[join_df2['marital'] == 'widowed']

Data Analyzing¶

Analyzing the frequently used words among married people when they feel happy¶

In [48]:
married_df.head()
Out[48]:
marital marital_code cleaned_hm predicted_category predicted_category_code gender words words_joined
15 married 1 When my family plan a abroad tour with me affection 1 m [When, my, family, plan, a, abroad, tour, with... family plan abroad tour
19 married 1 When my house ready to live with my family affection 1 m [When, my, house, ready, to, live, with, my, f... house ready live family
23 married 1 When my friend meet me today with expensive gi... bonding 2 m [When, my, friend, meet, me, today, with, expe... friend meet expensive gift
25 married 1 I was very happy when my son playing with whol... affection 1 m [I, was, very, happy, when, my, son, playing, ... son playing whole
33 married 1 When I shifted my new home achievement 0 m [When, I, shifted, my, new, home] shifted home
In [49]:
#Split the phrases into individual words, creating a list for each entry
In [50]:
words_split = married_df['words_joined'].str.split().tolist()
In [51]:
#Combined these lists into a single list to have all words together
In [52]:
all_words = [word for sublist in words_split for word in sublist]
In [53]:
#Used a special tool called Counter to help us count each word's occurrences.
In [54]:
from collections import Counter
In [55]:
word_counts = Counter(all_words)
In [56]:
top_15_words = word_counts.most_common(15)
In [57]:
print(top_15_words)
[('time', 459), ('wife', 364), ('husband', 340), ('son', 326), ('work', 311), ('last', 309), ('good', 303), ('family', 297), ('daughter', 294), ('happy.', 282), ('first', 252), ('really', 252), ('get', 229), ('home', 222), ('old', 214)]
In [58]:
word_counts = Counter(married_df['words_joined'])
In [59]:
join_df2.columns
Out[59]:
Index(['marital', 'marital_code', 'cleaned_hm', 'predicted_category',
       'predicted_category_code', 'gender', 'words', 'words_joined'],
      dtype='object')
In [60]:
#Create a unified text block from all the individual phrases or words 
In [61]:
married_text = " ".join(hm for hm in married_df['words_joined'].dropna())
single_text = " ".join(hm for hm in single_df['words_joined'].dropna())
divorced_text = " ".join(hm for hm in divorced_df['words_joined'].dropna())
seperated_text = " ".join(hm for hm in seperated_df['words_joined'].dropna())
widowed_text = " ".join(hm for hm in widowed_df['words_joined'].dropna())
In [62]:
#create wordcloud
In [63]:
wordcloud = WordCloud(width = 800, height = 800, 
                background_color ='white', 
                stopwords = set(), 
                min_font_size = 10).generate(married_text)
In [64]:
plt.figure(figsize = (3, 5), facecolor = None) 
plt.imshow(wordcloud) 
plt.axis("off") 
plt.tight_layout(pad = 0) 

plt.show()
In [65]:
#Finds and gathers sentences that include common words
In [66]:
high_freq_words = ['husband', 'wife', 'work','daughter','son','time','new']
sentences_with_high_freq_words = {word: [] for word in high_freq_words}
for index, row in married_df.iterrows():
    for word in high_freq_words:
        if word in row['cleaned_hm'].lower():
            sentences_with_high_freq_words[word].append(row['cleaned_hm'])

#The output is quite lengthy, so I've added comment markers to shorten it. You can remove these comment markers to see the full results.

#for word, sentences in sentences_with_high_freq_words.items():
#    print(f"Sentences containing the word '{word}':")
#    for sentence in sentences:
#        print(f"- {sentence}")
#    print("\n")

For married individuals, their happiness largely stems from their children, their partner, and their work.¶

Analyzing the frequently used words among single people when they feel happy¶

In [67]:
single_df = join_df2[join_df2['marital'] == 'single']
In [68]:
words_split = single_df['words_joined'].str.split().tolist()
In [69]:
all_words = [word for sublist in words_split for word in sublist]
In [70]:
word_counts = Counter(all_words)
In [71]:
top_15_words = word_counts.most_common(15)
In [72]:
print(top_15_words)
[('time', 469), ('friend', 438), ('really', 406), ('work', 396), ('good', 351), ('last', 331), ('friends', 300), ('first', 284), ('get', 278), ('happy.', 255), ('felt', 230), ('finally', 228), ('see', 220), ('came', 215), ('bought', 211)]
In [73]:
single_text = " ".join(hm for hm in single_df['words_joined'].dropna())
In [74]:
wordcloud = WordCloud(width = 800, height = 800, 
                background_color ='white', 
                stopwords = set(), 
                min_font_size = 10).generate(single_text)
In [75]:
plt.figure(figsize = (3, 4), facecolor = None) 
plt.imshow(wordcloud) 
plt.axis("off") 
plt.tight_layout(pad = 0) 

plt.show()
In [76]:
high_freq_words = ['work', 'friend', 'found']
sentences_with_high_freq_words = {word: [] for word in high_freq_words}
for index, row in single_df.iterrows():
    for word in high_freq_words:
        if word in row['cleaned_hm'].lower():
            sentences_with_high_freq_words[word].append(row['cleaned_hm'])

#for word, sentences in sentences_with_high_freq_words.items():
#    print(f"Sentences containing the word '{word}':")
#   for sentence in sentences:
#       print(f"- {sentence}")
#   print("\n")

For single individuals, their happiness largely stems from their fiends and their work.¶

Analyzing the frequently used words among divorced people when they feel happy¶

In [77]:
divorced_df = join_df2[join_df2['marital'] == 'divorced']
In [78]:
divorced_df.head()
Out[78]:
marital marital_code cleaned_hm predicted_category predicted_category_code gender words words_joined
158 divorced 0 I made vacation plans with my daughter today f... affection 1 f [I, made, vacation, plans, with, my, daughter,... vacation plans daughter Florida July.
1152 divorced 0 I picked my daughter up from the airport and w... affection 1 f [I, picked, my, daughter, up, from, the, airpo... picked daughter airport fun good conversation ...
2411 divorced 0 I had the weekly high score in an online game ... bonding 2 m [I, had, the, weekly, high, score, in, an, onl... weekly high score online game play friends.
2430 divorced 0 I met some friends for dinner and drinks follo... bonding 2 m [I, met, some, friends, for, dinner, and, drin... met friends dinner drinks followed watching li...
2434 divorced 0 My teenage daughter finished her hardest exams... affection 1 m [My, teenage, daughter, finished, her, hardest... teenage daughter finished hardest exams comple...
In [79]:
words_split = divorced_df['words_joined'].str.split().tolist()
In [80]:
all_words = [word for sublist in words_split for word in sublist]
word_counts = Counter(all_words)
top_15_words = word_counts.most_common(15)
print(top_15_words)
[('work', 37), ('good', 26), ('daughter', 25), ('friend', 22), ('get', 22), ('time', 21), ('really', 20), ('first', 19), ('finally', 18), ('last', 17), ('see', 17), ('son', 17), ('happy.', 16), ('dinner', 15), ('came', 15)]
In [81]:
divorced_text = " ".join(hm for hm in divorced_df['words_joined'].dropna())
In [82]:
wordcloud = WordCloud(width = 800, height = 800, 
                background_color ='white', 
                stopwords = set(), 
                min_font_size = 10).generate(divorced_text)
In [83]:
plt.figure(figsize = (3, 4), facecolor = None) 
plt.imshow(wordcloud) 
plt.axis("off") 
plt.tight_layout(pad = 0) 

plt.show()
In [84]:
high_freq_words = ['work', 'son', 'daughter']
sentences_with_high_freq_words = {word: [] for word in high_freq_words}
for index, row in divorced_df.iterrows():
    for word in high_freq_words:
        if word in row['cleaned_hm'].lower():
            sentences_with_high_freq_words[word].append(row['cleaned_hm'])
'''
for word, sentences in sentences_with_high_freq_words.items():
    print(f"Sentences containing the word '{word}':")
    for sentence in sentences:
        print(f"- {sentence}")
    print("\n")
'''
Out[84]:
'\nfor word, sentences in sentences_with_high_freq_words.items():\n    print(f"Sentences containing the word \'{word}\':")\n    for sentence in sentences:\n        print(f"- {sentence}")\n    print("\n")\n'

For divorced individuals, their happiness largely stems from their children and their work.¶

Analyzing the frequently used words among separated people when they feel happy¶

In [85]:
separated_df= join_df2[join_df2['marital'] == 'separated']
In [86]:
words_split = separated_df['words_joined'].str.split().tolist()
In [87]:
all_words = [word for sublist in words_split for word in sublist]
word_counts = Counter(all_words)
top_15_words = word_counts.most_common(15)
print(top_15_words)
[('friend', 10), ('time', 8), ('good', 8), ('go', 8), ('last', 6), ('get', 6), ('daughter', 5), ('really', 5), ('old', 4), ('watched', 4), ('talked', 4), ('free', 4), ('best', 4), ('son', 4), ('first', 4)]
In [88]:
separated_text = " ".join(hm for hm in separated_df['words_joined'].dropna())
In [89]:
wordcloud = WordCloud(width = 800, height = 800, 
                background_color ='white', 
                stopwords = set(), 
                min_font_size = 10).generate(separated_text)
In [90]:
plt.figure(figsize = (3, 4), facecolor = None) 
plt.imshow(wordcloud) 
plt.axis("off") 
plt.tight_layout(pad = 0) 

plt.show()
In [91]:
high_freq_words = ['friend', 'good', 'time']
sentences_with_high_freq_words = {word: [] for word in high_freq_words}
for index, row in separated_df.iterrows():
    for word in high_freq_words:
        if word in row['cleaned_hm'].lower():
            sentences_with_high_freq_words[word].append(row['cleaned_hm'])
'''
for word, sentences in sentences_with_high_freq_words.items():
    print(f"Sentences containing the word '{word}':")
    for sentence in sentences:
        print(f"- {sentence}")
    print("\n")
'''
Out[91]:
'\nfor word, sentences in sentences_with_high_freq_words.items():\n    print(f"Sentences containing the word \'{word}\':")\n    for sentence in sentences:\n        print(f"- {sentence}")\n    print("\n")\n'

For those who are separated, their moments of joy tend to center more around the company of friends and the value they place on their personal time.¶

Analyzing the frequently used words among widowed people when they feel happy¶

In [92]:
widowed_df = join_df2[join_df2['marital'] == 'widowed']
In [93]:
words_split = widowed_df['words_joined'].str.split().tolist()
In [94]:
all_words = [word for sublist in words_split for word in sublist]
word_counts = Counter(all_words)
top_15_words = word_counts.most_common(15)
print(top_15_words)
[('house', 5), ('son', 4), ('friend', 4), ('took', 4), ('bought', 3), ('two', 3), ('months', 3), ('wanted', 3), ('walked', 3), ('finally', 3), ('get', 3), ('really', 2), ('great.', 2), ('Seeing', 2), ('daughter', 2)]
In [95]:
separated_text = " ".join(hm for hm in separated_df['words_joined'].dropna())
In [96]:
wordcloud = WordCloud(width = 800, height = 800, 
                background_color ='white', 
                stopwords = set(), 
                min_font_size = 10).generate(widowed_text)
In [97]:
plt.figure(figsize = (3, 4), facecolor = None) 
plt.imshow(wordcloud) 
plt.axis("off") 
plt.tight_layout(pad = 0) 

plt.show()
In [98]:
high_freq_words = ['house', 'dahugter','son']
sentences_with_high_freq_words = {word: [] for word in high_freq_words}
for index, row in widowed_df.iterrows():
    for word in high_freq_words:
        if word in row['cleaned_hm'].lower():
            sentences_with_high_freq_words[word].append(row['cleaned_hm'])
'''
for word, sentences in sentences_with_high_freq_words.items():
    print(f"Sentences containing the word '{word}':")
    for sentence in sentences:
        print(f"- {sentence}")
    print("\n")
'''
Out[98]:
'\nfor word, sentences in sentences_with_high_freq_words.items():\n    print(f"Sentences containing the word \'{word}\':")\n    for sentence in sentences:\n        print(f"- {sentence}")\n    print("\n")\n'

For widowed individuals, their happiness largely stems from their house and their children.¶